Use of Lexical Statistics for Compound Word Recognition and Segmentation in Turkish
نویسنده
چکیده
Compound words are cross-linguistic morphological phenomena that occur in all languages. Compound words are widely accepted to be stored in the lexicon but their constituents need to be accessed during both language learning and production processes. In this study, the use of corpora was investigated for how to differentiate single-stem words from single-word compounds and then how to segment compound words when no phonological information is available. Stems and morphs discovered in manual segmentations of the METU-Sabancı Turkish Treebank and the CHILDES were employed in the compound word recognition task and the results were compared. The METU Turkish Corpus (with about 2 million words) and a webcorpus (with about 490 million of Turkish words) were utilized in the segmentation task. The results emphasize that the lexicon can be morpheme-based; and lexical frequencies are effective heuristics in compound word recognition and segmentation.
منابع مشابه
language development and lexical awareness of bilingual (Azeri -Persian) hard of hearing impaired children
The Relationship between Mean Length of utterance (MLU), Lexical Richness and syntactical and lexical metalinguistic Awareness in Bilingual (Turkish-Persian) normal and hearing impaired Children Objectives: Regarding the impact of hearing loss on language development and metalinguistic skill and being language development different from metalinguistic skill in bilingual children, studying of...
متن کاملWord-Forming Process in Azeri Turkish Language
The subject intended to study the general methods of natural word-forming in Azeri Turkish language. This study aimed to reach this purpose by analyzing the construction of compound Azeri Turkish words. Same’ei (2016) did a comprehensive study on word-forming process in Farsi, which was the inspiration source of this study for Azeri Turkish language word-forming. Numerous scholars had done vari...
متن کاملWritten word recognition by the elementary and advanced level Persian-English bilinguals
According to a basic prediction made by the Revised Hierarchical Model (RHM), at early stages of language acquisition, strong L2-L1 lexical links are formed. RHM predicts that these links weaken with increasing proficiency, although they do not disappear even at higher levels of language development. To test this prediction, two groups of highly proficie...
متن کاملTurkish Resources for Visual Word Recognition
We report two tools to conduct psycholinguistic experiments on Turkish words. KelimetriK allows experimenters to choose words based on desired orthographic scores of word frequency, bigram and trigram frequency, ON, OLD20, ATL and subset/superset similarity. Turkish version of Wuggy generates pseudowords from one or more template words using an efficient method. The syllabified version of the w...
متن کاملLexical stress in continuous speech recognition
Human listeners use lexical stress for word segmentation and disambiguation. We look into using lexical stress for largevocabulary speech recognition for the Dutch language. It appears that beside vowels, consonants should be taken into account. By introducing stressed phonemes, and features for spectral bands and the fundamental frequency, we reduce the word error rate by 2.6 %.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015